Annotated English

نویسنده

José Hernández-Orallo

چکیده

This document presents _Annota " tėd Ėnglish^, a system of diacritical symbols which turns English pronunciation into a precise and unambiguous process. The annotations are defined and located in such a way that the original English text is not altered (not even a letter), thus allowing for a consistent reading and learning of the English language with and without annotations. The annotations are based on a set of general rules that makes the frequency of annotations not dramatically high (despite the chaotic orthography of English). This makes the reader easily associate annotations with exceptions, and makes it possible to shape, internalise and consolidate some rules for the English language which otherwise are weakened by the enormous amount of exceptions in English pronunciation. The advantages of this annotation system are manifold. Any existing text can be annotated without a significant increase in size. This means that we can get an annotated version of any document or book with the same number of pages and fontsize. Since no letter is affected, the text can be perfectly read by a person who does not know the annotation rules, since annotations can be simply ignored. The annotations are based on a set of rules which can be progressively learned and recognised, even in cases where the reader has no access or time to read the rules. This means that a reader can understand most of the annotations after reading a few pages of _Annota " tėd Ėnglish^, and can take advantage from that knowledge for any other annotated document she may read in the future. These features pave the way for multiple applications. _Annota " tėd Ėnglish^ can be used as a tool for teachers and parents when English-speaking children learn English orthography or simply when they start reading. Annotated textbooks, tales, dictionaries and any other material can make English orthography less painful. _Annota " tėd Ėnglish^ can also be very useful for students of English as a foreign language, since pronunciation is terribly troublesome for them, because they need to learn the meaning, the spelling and the pronunciation for each word they learn, and the pronunciation does not improve at all by reading. This is so because we can only see the spelling and infer the meaning, but books do not provide the correct pronunciation for each word. In fact, incorrect pronunciation (we are not referring here to a bad accent or intonation) persists in people who has lived in an English-speaking country for decades, because the true pronunciation is never shown (unless the IPA transcription is looked up in a dictionary), just heard in different contexts and situations. Finally, _Annota " tėd Ėnglish^ can also be very practical for any regular user (native or not) of the English language, very especially when facing new (especially technical) words and they have doubts about their pronunciation. In this document we introduce the rules for annotations and its symbols. This document is not intended for a general audience, and the explanations are focussed to precisely understand how the annotation system works. It is obvious that if the system has to be explained to a final user, a shorter and simpler manual should be issued for that. In any case, for learning _Annota " tėd Ėnglish^, the best thing is to take a look at the examples. There is a section in this document with some annotated texts. It is recommended to take a quick look at them before getting into details.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Mobile, L2 vocabulary learning, and fighting illiteracy: A case study of Iranian semi-illiterates beyond transition level

As mobile learning simultaneously employs both handheld computers and mobile telephones and other devices that draw on the same set of functionalities, it throws open the door for swift connection between learners and teachers. This study examined and articulated the impact of the application of mobile devices for teaching English vocabulary items to 123 Iranian semi-illitera...

متن کامل

WikiCoref: An English Coreference-annotated Corpus of Wikipedia Articles

This paper presents WikiCoref, an English corpus annotated for anaphoric relations, where all documents are from the English version of Wikipedia. Our annotation scheme follows the one of OntoNotes with a few disparities. We annotated each markable with coreference type, mention type and the equivalent Freebase topic. Since most similar annotation efforts concentrate on very specific types of w...

متن کامل

POS-Tagger for English-Vietnamese Bilingual Corpus

Corpus-based Natural Language Processing (NLP) tasks for such popular languages as English, French, etc. have been well studied with satisfactory achievements. In contrast, corpus-based NLP tasks for unpopular languages (e.g. Vietnamese) are at a deadlock due to absence of annotated training data for these languages. Furthermore, hand-annotation of even reasonably well-determined features such ...

متن کامل

Annotated Corpora for Word Alignment between Japanese and English and its Evaluation with MAP-based Word Aligner

This paper presents two annotated corpora for word alignment between Japanese and English. We annotated on top of the IWSLT-2006 and the NTCIR-8 corpora. The IWSLT-2006 corpus is in the domain of travel conversation while the NTCIR-8 corpus is in the domain of patent. We annotated the first 500 sentence pairs from the IWSLT-2006 corpus and the first 100 sentence pairs from the NTCIR-8 corpus. A...

متن کامل

Exploiting parallel texts in the creation of multilingual semantically annotated resources: the MultiSemCor Corpus

In this article we illustrate and evaluate an approach to create high quality linguistically annotated resources based on the exploitation of aligned parallel corpora. This approach is based on the assumption that if a text in one language has been annotated and its translation has not, annotations can be transferred from the source text to the target using word alignment as a bridge. The trans...

متن کامل

Temporal Information Processing of a New Language: Fast Porting with Minimal Resources

We describe the semi-automatic adaptation of a TimeML annotated corpus from English to Portuguese, a language for which TimeML annotated data was not available yet. In order to validate this adaptation, we use the obtained data to replicate some results in the literature that used the original English data. The fact that comparable results are obtained indicates that our approach can be used su...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

CoRR

دوره abs/1012.5962 شماره

صفحات -

تاریخ انتشار 2010

Annotated English

نویسنده

چکیده

منابع مشابه

Mobile, L2 vocabulary learning, and fighting illiteracy: A case study of Iranian semi-illiterates beyond transition level

WikiCoref: An English Coreference-annotated Corpus of Wikipedia Articles

POS-Tagger for English-Vietnamese Bilingual Corpus

Annotated Corpora for Word Alignment between Japanese and English and its Evaluation with MAP-based Word Aligner

Exploiting parallel texts in the creation of multilingual semantically annotated resources: the MultiSemCor Corpus

Temporal Information Processing of a New Language: Fast Porting with Minimal Resources

عنوان ژورنال:

اشتراک گذاری